Search CORE

42 research outputs found

Automatic schema matching utilizing hypernymy relations extracted from the web

Author: Portisch Jan
Publication venue
Publication date: 01/01/2018
Field of study

This thesis explores how a large corpus of Is-a statements can be exploited for the task of schema matching

MAnnheim DOCument Server

Exploiting general-purpose background knowledge for automated schema matching

Author: Portisch Jan
Publication venue
Publication date: 01/01/2022
Field of study

The schema matching task is an integral part of the data integration process. It is usually the first step in integrating data. Schema matching is typically very complex and time-consuming. It is, therefore, to the largest part, carried out by humans. One reason for the low amount of automation is the fact that schemas are often defined with deep background knowledge that is not itself present within the schemas. Overcoming the problem of missing background knowledge is a core challenge in automating the data integration process. In this dissertation, the task of matching semantic models, so-called ontologies, with the help of external background knowledge is investigated in-depth in Part I. Throughout this thesis, the focus lies on large, general-purpose resources since domain-specific resources are rarely available for most domains. Besides new knowledge resources, this thesis also explores new strategies to exploit such resources. A technical base for the development and comparison of matching systems is presented in Part II. The framework introduced here allows for simple and modularized matcher development (with background knowledge sources) and for extensive evaluations of matching systems. One of the largest structured sources for general-purpose background knowledge are knowledge graphs which have grown significantly in size in recent years. However, exploiting such graphs is not trivial. In Part III, knowledge graph em- beddings are explored, analyzed, and compared. Multiple improvements to existing approaches are presented. In Part IV, numerous concrete matching systems which exploit general-purpose background knowledge are presented. Furthermore, exploitation strategies and resources are analyzed and compared. This dissertation closes with a perspective on real-world applications

MAnnheim DOCument Server

ALOD2vec matcher results for OAEI 2021

Author: Paulheim Heiko
Portisch Jan
Publication venue: RWTH Aachen
Publication date: 01/01/2022
Field of study

This paper presents the results of the ALOD2vec Matcher in the Ontology Alignment Evaluation Initiative (OAEI) 2021. The matching system exploits a Web-scale dataset, i.e. WebIsALOD, as background knowledge source. In order to make use of the dataset, the RDF2vec approach is applied to derive embeddings for each concept available in the dataset. ALOD2vec Matcher participated in the OAEI 2018 and 2020 campaigns before. This is the system’s third participation

MAnnheim DOCument Server

Entity Type Prediction Leveraging Graph Walks and Entity Descriptions

Author: Alam Mehwish
Biswas Russa
Paulheim Heiko
Portisch Jan
Sack Harald
Publication venue
Publication date: 12/09/2022
Field of study

The entity type information in Knowledge Graphs (KGs) such as DBpedia, Freebase, etc. is often incomplete due to automated generation or human curation. Entity typing is the task of assigning or inferring the semantic type of an entity in a KG. This paper presents \textit{GRAND}, a novel approach for entity typing leveraging different graph walk strategies in RDF2vec together with textual entity descriptions. RDF2vec first generates graph walks and then uses a language model to obtain embeddings for each node in the graph. This study shows that the walk generation strategy and the embedding model have a significant effect on the performance of the entity typing task. The proposed approach outperforms the baseline approaches on the benchmark datasets DBpedia and FIGER for entity typing in KGs for both fine-grained and coarse-grained classes. The results show that the combination of order-aware RDF2vec variants together with the contextual embeddings of the textual entity descriptions achieve the best results

KITopen

Fine-TOM matcher results for OAEI 2021

Author: Knorr Leon
Portisch Jan
Publication venue: RWTH Aachen
Publication date: 01/01/2022
Field of study

In this paper, the Fine-Tuned Transformes for Ontology matching (Fine-TOM) matching system is presented along with the results it achieved during its first participation in the Ontology Alignment Evaluation Initiative (OAEI) campaign (2021). The system uses the publicly available albert-base-v2 model, which has been fine-tuned with a training dataset that includes 20% of each reference alignment from the Anatomy, Conference, and Knowledge Graph track, as well as a wide variety of generated false examples. The model is then used by a separate matching pipeline which calculates a confidence score for each correspondence. In the submitted docker container, only the matching pipeline with an already fine-tuned model is included

MAnnheim DOCument Server

Wiktionary matcher results for OAEI 2020

Author: Paulheim Heiko
Portisch Jan
Publication venue: RWTH
Publication date: 01/01/2020
Field of study

This paper presents the results of the Wiktionary Matcher in the Ontology Alignment Evaluation Initiative(OAEI) 2020.Wiktionary Matcher is an ontology matching tool that exploits Wiktionary as external background knowledge source. Wiktionary is a large lexical knowledge resource that is collaboratively built online. Multiple current language versions of Wiktionary are merged and used for monolingual ontology matching by exploiting synonymy relations and for multilingual matching by exploiting the translations given in the resource. This is the second OAEI participation of the matching system. Wiktionary Matcher has been improved and is the best performing system on the knowledge graph track this year

MAnnheim DOCument Server

Putting RDF2vec in order

Author: Paulheim Heiko
Portisch Jan
Publication venue: RWTH Aachen
Publication date: 01/01/2021
Field of study

MAnnheim DOCument Server

KGvec2go – Knowledge graph embeddings as a service

Author: Hladik Michael
Paulheim Heiko
Portisch Jan
Publication venue: ELRA
Publication date: 01/01/2020
Field of study

In this paper, we present KGvec2go, a Web API for accessing and consuming graph embeddings in a light-weight fashion in downstream applications. Currently, we serve pre-trained embeddings for four knowledge graphs. We introduce the service and its usage, and we show further that the trained models have semantic value by evaluating them on multiple semantic benchmarks. The evaluation also reveals that the combination of multiple models can lead to a better outcome than the best individual model.Comment: to be published in the Proceedings of the International Conference on Language Resources and Evaluation (LREC) 202

arXiv.org e-Print Archive

MAnnheim DOCument Server